53 research outputs found
Using Incomplete Information for Complete Weight Annotation of Road Networks -- Extended Version
We are witnessing increasing interests in the effective use of road networks.
For example, to enable effective vehicle routing, weighted-graph models of
transportation networks are used, where the weight of an edge captures some
cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions
or travel time. It is a precondition to using a graph model for routing that
all edges have weights. Weights that capture travel times and GHG emissions can
be extracted from GPS trajectory data collected from the network. However, GPS
trajectory data typically lack the coverage needed to assign weights to all
edges. This paper formulates and addresses the problem of annotating all edges
in a road network with travel cost based weights from a set of trips in the
network that cover only a small fraction of the edges, each with an associated
ground-truth travel cost. A general framework is proposed to solve the problem.
Specifically, the problem is modeled as a regression problem and solved by
minimizing a judiciously designed objective function that takes into account
the topology of the road network. In particular, the use of weighted PageRank
values of edges is explored for assigning appropriate weights to all edges, and
the property of directional adjacency of edges is also taken into account to
assign weights. Empirical studies with weights capturing travel time and GHG
emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark)
offer insight into the design properties of the proposed techniques and offer
evidence that the techniques are effective.Comment: This is an extended version of "Using Incomplete Information for
Complete Weight Annotation of Road Networks," which is accepted for
publication in IEEE TKD
Document Simplicial Complex
A k-simplex is de�ned as k-dimensional geometric structure which is the convex hull
of k+1 points. Given k+1 points x0; :::; xk 2 Rk which are a�nely independent, the
set
C =
(
a0x0 + ::: + akxk
����
Xk
i=0
ai = 1 and ai � 0 for all i
)
;
is de�ned as the k-simplex determined by them. Simplex is a very basic building
structure in abstract topology. Collection of simplexes (or simplices) under certain
condition is called geometrical simplicial complex, which further helps to analyze a
geometrical structure on bigger scale. An abstract simplicial complex is a purely
combinatorial description of the geometric notion of a simplicial complex, consisting
of a family of non-empty �nite sets closed under the operation of taking non-empty
subsets.
A text document can be visualized as a geometric structure in topology. A docu-
ment is de�ned as a collection of words, where each word is considered to be a part of
vocabulary having a certain meaning. And an n-gram is a contiguous sequence of n
items from a given sample of text. Using the n-gram concept to de�ne a simplex we
can construct an abstract simplicial complex out of every text document. Thus from
this model, every simplex catches the local structure or behavior while a document
simplicial complex, which is the collection of all n-1 simplex, captures the global be-
havior of the document. We will study this considering we have a bag of documents
i.e. the universal set of documents.
The aim of this thesis is to understand abstract structure admitted by text doc-
uments to �nd more accurately the similar documents from the given family if text
documents. In our discussion, we will visualize a document as a geometrical entity
and will make use of such representation of a text document to fast the process of
querying, where given a query document one can �nd the semantically similar doc-
uments more e�ciently in the sense of time and similarity. For example, given a
set of documents as f1.\after clearing high school one joins college", 2.\College can
be joined only after passing high school" and 3.\High school and college must be
attended by everyone"g the document 1 and 2 are more semantically similar that 1
and 3 or 2 and 3.
After a brief glance at abstract topology, we study the topological structure and
behavior of text documents. A novel representation of documents is given in this
thesis. Using this new structure of a text document we represent each document as a
geometrical entity which further can be analyzed using topological tools. Using Earth
Mover's distance and Hausdor� distance we give a new formulation to fetch semantic
documents for a given query. To represent documents as a mathematical structure
in some Rk, we use Word2Vec model to �nd vector representation of each word in a
text document
Self-Supervised Few-Shot Learning on Point Clouds
The increased availability of massive point clouds coupled with their utility
in a wide variety of applications such as robotics, shape synthesis, and
self-driving cars has attracted increased attention from both industry and
academia. Recently, deep neural networks operating on labeled point clouds have
shown promising results on supervised learning tasks like classification and
segmentation. However, supervised learning leads to the cumbersome task of
annotating the point clouds. To combat this problem, we propose two novel
self-supervised pre-training tasks that encode a hierarchical partitioning of
the point clouds using a cover-tree, where point cloud subsets lie within balls
of varying radii at each level of the cover-tree. Furthermore, our
self-supervised learning network is restricted to pre-train on the support set
(comprising of scarce training examples) used to train the downstream network
in a few-shot learning (FSL) setting. Finally, the fully-trained
self-supervised network's point embeddings are input to the downstream task's
network. We present a comprehensive empirical evaluation of our method on both
downstream classification and segmentation tasks and show that supervised
methods pre-trained with our self-supervised learning method significantly
improve the accuracy of state-of-the-art methods. Additionally, our method also
outperforms previous unsupervised methods in downstream classification tasks.Comment: Accepted at NeurIPS 202
BERTops: Studying BERT Representations under a Topological Lens
Proposing scoring functions to effectively understand, analyze and learn various properties of high dimensional hidden representations of large-scale transformer models like BERT can be a challenging task. In this work, we explore a new direction by studying the topological features of BERT hidden representations using persistent homology (PH). We propose a novel scoring function named 'persistence scoring function (PSF)' which: (i) accurately captures the homology of the high-dimensional hidden representations and correlates well with the test set accuracy of a wide range of datasets and outperforms existing scoring metrics, (ii) captures interesting post fine-tuning 'per-class' level properties from both qualitative and quantitative viewpoints, (iii) is more stable to perturbations as compared to the baseline functions, which makes it a very robust proxy, and (iv) finally, also serves as a predictor of the attack success rates for a wide category of black-box and white-box adversarial attack methods. Our extensive correlation experiments demonstrate the practical utility of PSF on various NLP tasks relevant to BERT11Code is available at https://github.com/chauhanjatin10/BERTops © 2022 IEEE
Improving Data Quality by Leveraging Statistical Relational Learning
Digitally collected data su
↵
ers from many data quality issues, such as duplicate, incorrect, or incomplete data. A common
approach for counteracting these issues is to formulate a set of data cleaning rules to identify and repair incorrect, duplicate and
missing data. Data cleaning systems must be able to treat data quality rules holistically, to incorporate heterogeneous constraints
within a single routine, and to automate data curation. We propose an approach to data cleaning based on statistical relational
learning (SRL). We argue that a formalism - Markov logic - is a natural fit for modeling data quality rules. Our approach
allows for the usage of probabilistic joint inference over interleaved data cleaning rules to improve data quality. Furthermore, it
obliterates the need to specify the order of rule execution. We describe how data quality rules expressed as formulas in first-order
logic directly translate into the predictive model in our SRL framework
Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs
The recent proliferation of knowledge graphs
(KGs) coupled with incomplete or partial information, in the form of missing relations
(links) between entities, has fueled a lot of
research on knowledge base completion (also
known as relation prediction). Several recent works suggest that convolutional neural
network (CNN) based models generate richer
and more expressive feature embeddings and
hence also perform well on relation prediction.
However, we observe that these KG embeddings treat triples independently and thus fail
to cover the complex and hidden information
that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our
paper proposes a novel attention-based feature
embedding that captures both entity and relation features in any given entity’s neighborhood. Additionally, we also encapsulate relation clusters and multi-hop relations in our
model. Our empirical study offers insights
into the efficacy of our attention-based model
and we show marked performance gains in
comparison to state-of-the-art methods on all
datasets
- …